R has excellent graphics and plotting capabilities. They are mostly found in following three sources. + base graphics + the lattice package + the ggplot2 package
Base R graphics uses a pen and paper model for plotting while Lattice and ggplot2 packages are built on the routines first used in grid graphics.
First we’ll produce a very simple graph using the values in a treatment vector:
treatment <- c(0.02,1.8, 17.5, 55,75.7, 80)Now, let’s add a title, a line to connect the points, and some color:
Here we plot treatment using blue points overlayed by a line
plot(treatment, type="o", col="blue",main="Treatment")Create a title with a red, bold/italic font
title(main="Treatment", col.main="red", font.main=4)Put it all together
plot(treatment, type="o", col="blue", ylim=c(0,100))
lines(control, type="o", pch=22, lty=2, col="red")
title(main="Expression Data", col.main="red", font.main=4)Next let’s change the axes labels to match our data and add a legend. We’ll also compute the y-axis values using the max function so any changes to our data will be automatically reflected in our graph.
Calculate range from 0 to max value of data
g_range <- range(0, treatment, control)range() returns a vector containing the minimum and maximum of all the given arguments.
Plot treatment using y axis that ranges from 0 to max value in treatment or control vector. Turn off axes and annotations (axis labels) so we can specify them ourselves.
plot(treatment, type="o", col="blue",
ylim=g_range,axes=FALSE, ann=FALSE)Make x axis using labels
axis(1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))Make y axis with horizontal labels that display ticks at every 20 marks.
axis(2, las=1, at=20*0:g_range[2])Create box around plot
box()Add control data, main title and x/y axis titles
lines(control, type="o", pch=22, lty=2, col="red")
title(main="Data", col.main="red", font.main=4)
title(xlab="Days", col.lab=rgb(0,0.5,0))
title(ylab="Values", col.lab=rgb(0,0.5,0))Create a legend at (1, g_range[2]) that is slightly smaller (cex) and uses the same line colors and points used by the actual plots
legend(1, g_range[2], c("treatment","control"), cex=0.8,
col=c("blue","red"), pch=21:22, lty=1:2); Let’s start with a simple bar chart graphing the treatment vector: Plot treatment
barplot(treatment)Let’s now read the data from the example.txt data file, add labels, blue borders around the bars, and density lines:
Read values from tab-delimited example.txt
data <- read.table("data/example.txt", header=T, sep="\t")names.arg is a vector of names to be plotted below each bar or group of bars.
density is a vector giving the density of shading lines, in lines per inch, for the bars or bar components.
barplot(data$treatment, main="Treatment", xlab="Days",ylab="values",
names.arg=c("Mon","Tue","Wed","Thu","Fri","Sat"),
border="blue", density=c(10,20,30,40,50,60))Let’s start with a simple histogram plotting the distribution of the treatment vector:
Create a histogram for treatment
hist(treatment) Concatenate the three vectors
all <- c(data$control, data$treatment)Create a histogram for data in light blue with the y axis ranging from 0-10
hist(all, col="lightblue", ylim=c(0,10))Now we can configure the groups in the histogram using the breaks parameter.
For breaks we must supply a single number giving the number of cells for the histogram or the open intervals as a vector.
Compute the largest value used in the data
max_num <- max(all)Here we create a histogram setting breaks so each number is in its own group and make the x axis range from 0-max_num.
hist(all, col=heat.colors(max_num), breaks=max_num,
xlim=c(0,max_num),main="Histogram", las=1) Here we set the freq parameter to be FALSE for probability densities instead of TRUE for the histogram graphic to be a representation of frequencies.
hist(all,breaks=max_num,xlim=c(0,max_num),
main="Probability Density",las=1, cex.axis=0.8, freq=F)Now let’s add a heading, change the colours, and define our own labels:
Create a pie chart with defined heading and custom colours and labels
pie(treatment, main="Treatment", col= c("lightblue", "mistyrose",
"lightcyan","lavender",
"cornsilk","maroon"),
labels=c("Mon","Tue","Wed","Thu","Fri","Sat")) Now let’s change the colours, label using percentages, and create a legend:
Define some colours ideal for black & white print
colors <- c("white","grey70","grey90","grey50","black")Calculate the percentage for each day, rounded to one decimal place
treatment_labels <- round(treatment/sum(treatment) * 100, 1)Concatenate a ‘%’ char after each value
treatment_labels <- paste(treatment_labels, "%", sep="")Create a pie chart with defined heading and custom colors and labels
pie(treatment, main="treatment", col=colors, labels= treatment_labels,
cex=0.8)Create a legend at the right
legend(1.5, 0.5, c("Mon","Tue","Wed","Thu","Fri","Sat"), cex=0.8,
fill=colors) Let’s start with a simple dot chart graphing the data:
Here we use the function t to return the transpose of a matrix.
dotchart(t(data)) Let’s make the dotchart a little more colorful:
Now we create a colored dotchart for autos with smaller labels
dotchart(t(data), color=c("red","blue","darkgreen"),
main="Dotchart", cex=0.8) The final plot we will look at is a box and whisker plot.
Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.
First lets read in the gene expression data
exprs <- read.delim("data/gene_data.txt",sep="\t",h=T,row.names = 1)
head(exprs)## Untreated1 Untreated2 Treated1 Treated2
## ENSDARG00000093639 0.8616832 1.9311442 0.1041508 0.14055604
## ENSDARG00000094508 0.9857575 2.0256352 0.1549917 0.20301609
## ENSDARG00000095893 0.8498889 1.9875580 0.2317969 0.20925123
## ENSDARG00000095252 0.9242996 2.0857620 0.2562264 0.24669079
## ENSDARG00000078878 0.3571734 0.4653908 0.1167221 0.09710237
## ENSDARG00000079403 1.0604071 1.2581398 0.3884836 0.31567299
Now we can use the boxplot() function on our data.frame to get our boxplot
boxplot(exprs)Perhaps it would look better on a log scale. We can add addition colours and labels as with other plots.
boxplot(log2(exprs),ylab="log2 Expression",
col=c("red","red","blue","blue"))Here, we will use different dataset with two columns each for treated and untreated samples.
data1 <- read.table("data/gene_data.txt", header=T, sep="\t")
head(data1)## ensembl_gene_id Untreated1 Untreated2 Treated1 Treated2
## 1 ENSDARG00000093639 0.8616832 1.9311442 0.1041508 0.14055604
## 2 ENSDARG00000094508 0.9857575 2.0256352 0.1549917 0.20301609
## 3 ENSDARG00000095893 0.8498889 1.9875580 0.2317969 0.20925123
## 4 ENSDARG00000095252 0.9242996 2.0857620 0.2562264 0.24669079
## 5 ENSDARG00000078878 0.3571734 0.4653908 0.1167221 0.09710237
## 6 ENSDARG00000079403 1.0604071 1.2581398 0.3884836 0.31567299
Plot histograms for different columns in the data frame separately. This is not very efficient. You could also do it more efficiently using for loop.
par(mfrow=c(2,2))
hist(data1$Untreated1)
hist(data1$Treated2)
hist(data1$Untreated2)
boxplot(data1$Treated1)Saving in bitmap format
bmp(file = "control.bmp")
plot(control)
dev.off()Saving in postscript format
postscript(file = "control.ps")
plot(control)
dev.off()Exercise on base plotting can be found here
Answers for baseplotting can be found here